Skip to content

refactor: simplify build to single so without CMakeLists.txt#113

Merged
GuoxiaWang merged 1 commit intoPaddlePaddle:mainfrom
baoqiwen:bqw_fa3_handle_v2
Mar 5, 2026
Merged

refactor: simplify build to single so without CMakeLists.txt#113
GuoxiaWang merged 1 commit intoPaddlePaddle:mainfrom
baoqiwen:bqw_fa3_handle_v2

Conversation

@baoqiwen
Copy link

@baoqiwen baoqiwen commented Mar 4, 2026

2步构建(cmake+setup.py)变成 1步(setup.py),删除CMakeLists.txt
2个so合并成1个so,不需要跨so边界调用,删除 C ABI wrapper

FLASHMASK_BUILD=fa4 → 纯 Python 安装,不 import paddle,不编译 CUDA
FLASHMASK_BUILD=fa3 → 只编译 FA3 CUDA 内核,排除 cute/ 包
FLASHMASK_BUILD=all → 默认,两个都装

@baoqiwen baoqiwen force-pushed the bqw_fa3_handle_v2 branch 4 times, most recently from 5d4ffd1 to 6124f93 Compare March 4, 2026 13:28
const bool has_lt_start = lt_start_row_indices.defined();
const bool has_lt_end = lt_end_row_indices.defined();
const bool has_ut_start = ut_start_row_indices.defined();
const bool has_ut_end = ut_end_row_indices.defined();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个为啥删掉了

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为has_lt_start和has_ut_end没用到。在后面会报warnning。

BUILD_FA3 = FLASHMASK_BUILD in ('fa3', 'all')
BUILD_FA4 = FLASHMASK_BUILD in ('fa4', 'all')

VERSION = '4.0+g' + get_git_commit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要删掉这个,whl的名字要带上git信息

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

include_dirs=[
'flash_mask/flashmask_attention_v3/csrc',
'flash_mask/flashmask_attention_v3',
'flash_mask/flashmask_attention_v3/cutlass/include',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续再接入flashmask v1时,需要看看怎么避免cutlass版本冲突。

# FLASHMASK_BUILD=fa4 pip install -e . --no-build-isolation
# FLASHMASK_BUILD=fa3 pip install -e . --no-build-isolation
# pip install -e . --no-build-isolation # builds all
# ============================================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

发版时会通过这个方式构建whl
python setup.py bdist_wheel

@@ -1,292 +0,0 @@
cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的逻辑是都移到setup.py里了吗?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对的

flashmaskv3_clear_fwd_params_handle(params_handle);
Flash_fwd_params params_obj = {};
Flash_fwd_params *params_handle = &params_obj;
set_flashmaskv3_params_fprop(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个params_handle感觉可以进一步去掉了,接口代码里大部分调用直接用params.member = var这样的方式就行。不过,可以先这样合入一版。

@baoqiwen baoqiwen force-pushed the bqw_fa3_handle_v2 branch from 6124f93 to e4ea622 Compare March 5, 2026 07:25
@baoqiwen baoqiwen force-pushed the bqw_fa3_handle_v2 branch from e4ea622 to 61caa9d Compare March 5, 2026 07:29
Copy link
Member

@umiswing umiswing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

for split in split_suffixes:
for paged in paged_suffixes:
for softcap in softcap_fwd_suffixes:
for packgqa in packgqa_suffixes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续可以用iteration优化一下

@GuoxiaWang GuoxiaWang merged commit 0f6fc4d into PaddlePaddle:main Mar 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants